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Abstract 

We develop simple models for the global spread of infectious diseases, emphasizing 
human mobility via air travel and the variation of public health infrastructure from 
region to region. We derive formulas relating the total and peak number of infections 
in two countries to the rate of travel between them and their respective epidemiological 
parameters. 
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1 Introduction and linear model 



Recent outbreaks of atypical pneumonia (SARS) [1] as well as likely future epidemics and 
pandemics of influenza [2] provide ample motivation for the study of the global propagation 
of infectious diseases. (For an overview of modelling of infectious diseases in humans, see 
[3]). In this letter we construct a class of simple models of this phenomena, with particular 
attention to human mobility and the variation of public health capabilities across national 
and regional boundaries. Spatial dynamics involving the diffusion equation, of the type 
studied in [4] (as in, e.g., the spread of rabies in fox populations), is of less interest to 
us than regional or national variation of parameters. We treat each region as spatially 
homogenous and focus on mobility (e.g., via air travel) as the mechanism by which disease 
is transmitted. 

The population variables in our model: S = susceptibles, / = infected, R = recovered 
and D = deceased, are functions of time t and of discrete geographical location labelled by 
an index i — 1, • • • , N. Thus our equations describe the time evolution of a 4N dimensional 
vector: 

/ Si \ 
h 
Ri 

S 2 (1) 
h 



V • / 

In the simplest linear model, the dynamics of x are characterized by a 4N x 4N matrix 
whose entries are probabilities per unit time: t« = probability of transmission of disease 
from an infected to susceptible individual in region i, = probability of recovery of infected 
individual in region i, di — probability of death of infected individual in region i, rrii^j 
= probability of movement of an infected individual from country % to country j. Each of 
these probabilities incorporates several additional parameters that would appear in a more 
complicated model. For example, tj depends on other factors such as the probability and 
effectiveness of quarantine, the length of time an infected individual remains asymptomatic, 
the population density, etc. A very simple modification would be to include asymptomatic 
as well as symptomatic carriers of the infection by expanding our matrix to 5N x 5N 
dimensions. Contagious asymptomatics would have a large transmission probability, as they 
are difficult to quarantine. We will return later to enhancements of the basic model. 
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The basic parameters ri,U, di,rrii^j vary from region to region, and can be estimated 
from epidemiological data. Our organizing principle for dividing the world into regions is 
the relative homogeneity of these parameters within each region. This division may or may 
not follow national boundaries, as the parameters may vary significantly (e.g., from urban 
to rural regions) within a given country. 

The basic reproductive rate [3], commonly denoted TZo, is defined as the average number 
of secondary cases caused by a single infected individual. A simple computation (neglecting 
migration) for region % yields 

K = u + (l-n-dju + {i-n-difu + ■■■ = (—^t) ■ (2) 

\ Ti ~r / 

When TZo > 1 the infected population grows exponentially. 

The system of equations governing the time evolution of x is 

f = Mf + / , (3, 

where J is a possible source of infected individuals coming from zoonosis, an animal reservoir 
of disease. All elements of J are zero except the (4i — 2) and (4i — 3) entries corresponding 
to an animal reservoir in region i which contributes to dli/dt and —dSi/dt. Note that, 
unlike the well-known SIR model [4], equation (3) is linear. The transmission rate per unit 
time is given by ijij rather than t^iSi. This makes our model slightly less realistic, as the 
transmission rate per infected individual does not drop as the number of susceptibles goes to 
zero. The SIR model predicts some fraction of uninfected susceptibles S(t — > oo) even when 
TZq > 1, whereas our model would predict that each individual in the entire population is 
eventually be infected. This leads to an overestimate relative to the SIR model of the total 
number of infected in countries with TZo > 1- However, the difference is only numerically 
significant when TZo is close to 1. (We do not expect the input parameters to be sufficiently 
well-determined that, e.g., a factor of two in the predicted number of infections is very 
significant.) Another method for cutting off the exponential growth is to add (nonlinear) 
logistic terms of the form —kilf to the right hand side of the equation, at the cost of adding 
additional parameters. 

Equation (3) can be solved analytically: 

x = -M- l J + e Mt (xo + M' 1 j) , (4) 

where Xo is the initial condition. The importance of the variables Si,Ri, Di is to impose the 
overall conservation of humans, which is a consequence of the form of M, and the constraint 
that S, I, R, D be positive semi-definite at all times. Although equations (3) and (4) appear 
simple, the constraints lead to a nontrivial system. 
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Below we exhibit the matrix M for the case of two countries, % = 1,2. 



/ -fi 000 o o o \ 

-n -d 1 - m^2 + h m 2 ^i 

o n ooo o oo 

di 000 00 

-t 2 

"^1^2 — r 2 — <i 2 — wi2^i + *2 

000 r 2 00 

v 000 d 2 j 



Note the sum rule implicit in this matrix: J2i x% = constant, which reflects conservation of 
humans. 

Despite the simplicity of these models, they may be useful tools for public health pol- 
icy. For example, the economic impacts on developed country economies from diseases in 
developing nations can be estimated. This data can be used in cost-benefit assessments of 
improvements in public health infrastructure in both developed and developing nations. 



2 Linear model: analytics and simulations 

We now discuss some of the analytic properties of our linear models. First, we note that 
the system of 4N equations can be reduced to a smaller number of equations, due to the 
large number of zeroes in M. In essence, the dynamics involves only the infected individuals 
If. the evolution of the Si, Ri and Di variables are dependent on Jj. Thus there are really 
only N equations (but subject to constraints; see below), as can be seen from the fact that 
M has only N non-zero eigenvalues. The matrix M, restricted to the N x N subspace of 
infected populations Jj, is diagonal up to the (small) off-diagonal mobility entries m^j . If 
the mobility matrix were symmetric, we could diagonalize M via an orthogonal change of 
basis. However, in general m^j ^ rrij^i , so we need to use a bi-linear transformation. That 
is, we can obtain a diagonal matrix Mu 

M D = LMR , (6) 

where L = R~ l = R T up to corrections which vanish for symmetric m^j. In the basis 
y = R~ l x, equation (3) becomes 

LR% = M D y + LJ . (7) 
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While the eigenvalues Aj tell us a great deal - essentially, what would happen in each region if 
it were isolated - there is additional information in the L, R matrices which relate the original 
regional basis to the diagonal basis. Below we investigate explicit scenarios involving a 
developing country i with positive eigenvalue Aj, and a developed country j with eigenvalue 
Aj which is either negative, or, if positive, smaller than Aj. In these scenarios the 
mobility is much more important than the reverse migration TOj_>j, since the spread of disease 
is mainly unidirectional. Hence, we could simply assume = rrij^i without changing 

the results appreciably. In that case, L = R" 1 , so equation (7) becomes uncoupled. 

In the N = 2 case, the non-zero eigenvalues are 

Ai = —r 1 — di — m^ 2 + t 1 

A2 = —ri — d 2 — m 2 ^i + t 2 . (8) 

Note that a positive eigenvalue Aj corresponds to 7Z % > 1. 

Two generic implications can be stated, depending on the sign of Aj. Here we assume 
that the entries of the off-diagonal mobility matrix small compared to the other 

probabilities, so that we can discuss the eigenvalue associated an individual region %. This 
is likely to be the case in any realistic scenario where the mobility matrix reflects air travel. 

A country with positive eigenvalue Aj will experience exponential growth in the number 
of infecteds /j. This growth is only limited by the constraint of total population: eventually 
roughly the entire susceptible population has been infected and after a long period of time 
there are only recovered and dead individuals. (This situation is familiar in the case of 
the common cold or influenza.) There are two important timescales involved: A" 1 , which 
determines the rate of exponential growth, and |Aj — tjl" 1 , (Aj — ij is negative) which gives the 
rate of decline of Jj after saturation occurs and there are no more susceptibles to be infected 
(Si = 0, so effectively the transmission probability ti can be set to zero in the corresponding 
eigenvalue). The outbreak in a highly susceptible country cannot last much longer than the 
sum of these timescales. 

A country with a negative eigenvalue is capable of suppressing any outbreaks of the 
disease. Nevertheless, there may be a steady state number of infections 

i^j \ A 3 ~ M 

due to the animal reservoir J and/or the migration of infected individuals from other regions. 
This equation is valid for small compared to \Xj — \\. (The difference in eigenvalues 

results from the diagonalization of M, and is analogous to the result in quantum mechanics 
for the change in energy eigenstate under a small perturbation to the Hamiltonian.) The 
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most dangerous sources of disease migration are countries i with positive eigenvalue A«. The 
total flux of migration is roughly determined by the mobility m^-, the population Si(0) and 
the timescales A" 1 and \\ — ti\~ x . 

Consider a plausible worst case scenario involving a developing country i with positive Aj 
and much larger population than developed country j which has negative Aj. The developing 
country is the main source of infection for the developed country. In this case, we obtain 
the simple relation 

h - (10) 

\Aj — Ai\ 

relating the number of infected in the two countries at any particular time. The total number 
of infections in country j (during the entire epidemic) is related to Ij(t) in a complicated 
way, depending on recovery and death rates. However, by solving equation (4) we can obtain 
a simple expression for the ratio of total number of infections iVj in the two countries 



to leading order in m^j. The first ratio in this result can be interpreted as the fraction of 
infected individuals who travel from country i to j in the timescale |Aj| _1 , which is roughly 
the "halving" time in country j. As long as the death and recovery rates in the two countries 
are not too different, this can be used as a rule of thumb to estimate the ratio of total number 
of infections. Obviously it behooves country j to make | 1 1 as small as possible. 

We can also obtain simple expressions for the peak number of infected individuals in 
both countries (neglecting animal reservoirs): 

ir k = t { Pi ~ m i 1 + ~t)) - t Pi (12) 

and 





(13) 

where ij(0) is the initial number of infecteds, and is the total population, in country 
i. This gives the maximum capacity required of the medical system during the epidemic. 
Again, our result is likely to be an overestimate relative to a more realistic SIR or logistic 
model. 

Below we describe the results of two illustrative simulations of the linear model. The 
parameters are chosen to be realistic average probabilities per week of transmission, recovery, 
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death and international travel. We integrate equation (3) forward numerically, discretizing 
time in units of weeks, and starting with a random number of infecteds between 1 and 10. 
Country 1 has population 10 9 and country 2 has population 10 8 . Note that the figures display 
the infected population Ii(t) as a function of time, not the total number of individuals who 
have ever been infected. The latter quantity, although not displayed, agrees precisely with 
the result given in equation (11). The peak infected results in equations (12) and (13) are 
also confirmed. 

1) One positive and one negative eigenvalue. The epidemic infects the entire population 
of country 1 (figure (1)), but infects only a small fraction of the population of country 2 
(figure (2)). The parameter values used are ti — 1; r\ — 0.7; d\ = 0.2; t 2 = 0.85; r 2 = 
0.9; d 2 = 0.05; m 12 = 0.00001; m 21 = 0, eigenvalues Ai = 0.09999, A 2 = -0.1. The overall 
results are consistent with our result (11). 

2) Two positive eigenvalues. The epidemic sweeping through country 1 (figure (3)) drives 
rapid growth in country 2 (figures (4), (5)) until it has completed its course. At this point 
slower growth characterized by A 2 (which is very small) resumes. In the end all susceptible 
individuals in both countries are infected. The falloff of I 2 appears abrupt in figure (5) 
because |A 2 — t 2 \ is much larger than A 2 . t\ — 1; r± — 0.75; d\ = 0.2; t 2 = 0.851; r 2 = 
0.8; d 2 = 0.05; m 12 = 0.00001; m 21 = 0, eigenvalues Ai = 0.04999, A 2 = 0.001. 

The numerical values of parameters have been varied only slightly between the two ex- 
amples, but the resulting behaviors are very different, largely because the eigenvalue A 2 has 
changed sign. 



3 Nonlinear models 

To make the models more realistic, we can let the parameters tj, r i; di depend on local quan- 
tities X{. The resulting equations are nonlinear, although still straightforward to simulate 
numerically. For example, it is possible that a region's medical capabilities are overwhelmed 
as the number of infected individuals increases. Indeed, in both the SARS outbreak and the 
influenza of 1918 (Spanish flu) medical personnel were disproportionately affected. 

As a simple example, we take the parameters from simulation 1 in the previous section, 
and let the recovery rate r 2 of country 2 (which had a negative eigenvalue) interpolate 
between the initial value r 2 when I 2 = and r\ when J 2 is large, according to the formula 




(14) 
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This reflects a saturation of medical treatment capabilities after some characteristic number 
of infections. When I 2 is much larger than J| the recovery probability per unit time is 
reduced from its initial value of r 2 to r 2 , and the eigenvalue A 2 may become positive. We 
take J 2 = 2000, r 2 = .9 and r 2 = .7. The results are shown in figures (6) and (7). Figure 
(7) reveals the onset of the nonlinear behavior when J 2 is a few thousand (compare to figure 
(2))- 



4 Internet viruses and worms 

These models can be modified to describe the propagation of computer viruses and worms. 
(From the point of view of our models, there is no essential difference between the two.) A 
virus commonly inserts itself into other program files, in the same manner that a virus in 
nature takes over the workings of normal cells. When the infected program runs, the virus 
code gets a chance to inspect its environment and look for and infect new carriers in the 
form of other program files. A worm is a self-replicating program that does not alter files but 
resides in active memory and duplicates itself by means of computer networks. Worms use 
facilities of an operating system that are meant to be automatic and invisible to the user. 
It is common for worms to be noticed only when their uncontrolled replication consumes 
system resources, slowing or halting other tasks. 

The variation of parameters by region in our model would be a consequence of different 
levels of security administration in home, small business, educational and corporate networks. 

Mobility has no analog here, as the infected computer does not hop from network to 
network. Rather, an infected machine can infect other machines connected to the Internet, 
leading to a non-local (off-diagonal) transmission matrix t^j. Machines on local networks k 
with strong firewalls and virus scanning would be difficult to infect, leading to correspond- 
ingly small entries t^k- However, once a virus or worm has penetrated the firewall, spread 
within a local network can be quite rapid, so the diagonal elements may be quite large. 
Often, the only factor limiting the rate of infection is saturation of available bandwidth by 
worm scanning activity. 

"Slammer", the most successful worm to date, exploited a security hole in Microsoft's 
SQL Server (a database server), and infected 75,000 machines around the world in less than 
a half hour on January 25, 2003 [5]. The initial doubling time of the infected population was 
8.5 seconds, and the growth curve displayed a typical logistic shape. 
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5 Discussion 



We have attempted to model the spread of infectious diseases in a global environment, taking 
into account geographical variation of public health capabilities as well as human mobility 
(i.e., via air travel). We believe our models capture a large range of possible behaviors using 
a minimal set of parameters, each of which can be roughly estimated from data. 

One interesting conclusion from our models is that typical international mobility - the 
probability per unit time of international travel for a given infected individual, estimated 
at rrii^j ~ 1CT 5 per week - is still sufficiently small that a country with well-developed 
public health infrastructure (effectively, a negative eigenvalue A) can resist an epidemic even 
when other more populous countries experience complete saturation. In the quasi-realistic 
simulation 1 (figures (1),(2)), of order 10 5 infections occur in country 2, even though the 
disease has swept completely through country 1. In reaching this conclusion, we kept the 
mobility parameter fixed during the outbreak, and did not assume any draconian quarantine 
on international travellers arriving in country 2. Such measures would reduce the number 
of infections in country 2 considerably. Of course, this conclusion assumes that the public 
health infrastructure in country 2 remains robust during the outbreak. In the nonlinear 
simulation 3 (figures (6), (7)), we see that a breakdown in the medical system can lead to 
grave consequences. 

In the case of two countries, one of which is a "reservoir" with positive eigenvalue Aj and 
the other with negative eigenvalue Xj, a good rule of thumb arising from equation (11) is 
that the ratio of total number of infections in the two countries is given by the fraction of 
infected individuals who migrate in a timescale (Ajl" 1 , which is the "halving" time for the 
epidemic in country j. In our simulation 1, this timescale is about two months, and the 
fractional mobility over that period is ~ 10 -4 , leading to 10 5 infections in country 2 if the 
entire 10 9 population of country 1 is infected. The maximum number of infections at any 
given time in the simulation is 5 • 10 3 (figure (2)). If the medical system of country 2 can 
treat this number of patients without breaking down (entering the nonlinear regime), it can 
prevent a larger outbreak. 

Our formula (11) is not applicable to early SARS data since the eigenvalues have clearly 
been time-dependent during the early stages of the epidemic (estimates of TZq vary widely 
during different time periods [1]). Even developed countries like Canada and Taiwan ex- 
hibited 1Z > 1 during the first months (positive eigenvalues), although as of this writing 
eigenvalues appear to be negative for all countries. 
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Figure 1: Number of infected in country 1, simulation 1. Note saturation and cutoff. 
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Figure 2: Number of infected in country 2 (negative eigenvalue), simulation 1. 
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Figure 3: Number of infected in country 1, simulation 2. Note saturation and cutoff. 
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Figure 4: Number of infected in country 2 (small positive eigenvalue), simulation 2. Rapid 
growth driven by migration from country 1 levels off after epidemic subsides in country 1. 
Subsequent growth in country 2 is much slower. 





Figure 5: Number of infected in country 2 (small positive eigenvalue), simulation 2. This 
graph extends over a longer period of time, exhibiting eventual saturation. 
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Figure 6: Number of infected in country 2, simulation 3. Initially negative eigenvalue be- 
comes positive as number of infected increases, leading to saturation in country 2. 
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Figure 7: Number of infected in country 2, simulation 3. Detail of non-linear region in which 
recovery probability per week falls from .9 to .7. Compare to figure (2). 
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